Hybrid Microdata via Model-Based Clustering
نویسندگان
چکیده
In this paper we propose a new scheme for statistical disclosure limitation which can be classified as a hybrid method of protection, that is, a method that combines properties of perturbative and synthetic methods. This approach is based on model-based clustering with the subsequent synthesis of the records within each cluster. The novelty is that the clustering and synthesis methods have been carefully chosen to fit each other in view of reducing information loss. The model-based clustering tries to obtain clusters such that the within-cluster data distribution is approximately normal; then we can use a multivariate normal synthesizer for the local synthesis of data. In this way, some of the nonnormal characteristics of the data are captured by the clustering, so that a simple synthesizer for normal data can be used within each cluster. Our method is shown to be effective when compared to other disclosure limitation strategies.
منابع مشابه
A New WordNet Enriched Content-Collaborative Recommender System
The recommender systems are models that are to predict the potential interests of users among a number of items. These systems are widespread and they have many applications in real-world. These systems are generally based on one of two structural types: collaborative filtering and content filtering. There are some systems which are based on both of them. These systems are named hybrid recommen...
متن کاملEvaluation of per-record identification risk and swappability of records in a microdata set via decomposable models
We propose a strategy for disclosure risk evaluation and disclosure control of a microdata set based on fitting decomposable models of a multiway contingency table corresponding to the microdata set. By fitting decomposable models, we can evaluate per-record identification (or re-identification) risk of a microdata set. Furthermore we can easily determine swappability of risky records which doe...
متن کاملMATHEMATICAL ENGINEERING TECHNICAL REPORTS Evaluation of per-record identification risk and swappability of records in a microdata set via decomposable models
We propose a strategy for disclosure risk evaluation and disclosure control of a microdata set based on fitting decomposable models of a multiway contingency table corresponding to the microdata set. By fitting decomposable models, we can evaluate per-record identification (or re-identification) risk of a microdata set. Furthermore we can easily determine swappability of risky records which doe...
متن کاملبخش بندی مراجعین کتابخانه های عمومی بر مبنای نیازهایشان با استفاده از شبکه عصبی مصنوعی، تحلیل سلسله مراتبی و مدل کانو
Purpose: Clients are crucial factors in the success of public libraries and each of them has different needs. So public libraries should know their clients and plan to meet their needs in order to ensure satisfaction. Methodology: In this research a hybrid model based on clustering method which uses the Neural Network, Analytical Hierarchy Process (AHP) and Kano model is used in order to segm...
متن کاملProposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012